168

Applications in Computer Vision

BN

Conv.

PReLU

BN

BN

+1, +1, +1

+1, +1, +1

+1, +1, +1

-1, -1, -1

-1, -1, -1

-1, -1, -1

ߚଵଵ

, ߚଵଶ

, ߚଵଷ

ߚଶଵ

, ߚଶଶ

, ߚଶଷ

ߚଷଵ

, ߚଷଶ

, ߚଷଷ

۩

ܮ

஺௡௚

PReLU

BN

ܮ

஺௠௣

Differentiable Binarization Search

Learning scale factor

ߚଵଵ

, ߚଵଶ

, ߚଵଷ

ߚଶଵ

, ߚଶଶ

, ߚଶଷ

ߚଷଵ

, ߚଷଶ

, ߚଷଷ

෥ܟ

܉௜ିଵ

ܟ

ො܉௜ିଵ

ෝܟ

ି

ෝܟ

ߚ

ߚ

Real-valued Teacher

1-bit Student

FIGURE 6.10

Our LWS-Det. From left to right are the input, search, and learning processes. For a given 1-

bit convolution layer, LWS-Det first searches for the binary weight (+1 or1) by minimizing

the angular loss supervised by a real-valued teacher detector. LWS-Det learns the real-valued

scale factor α to enhance the feature representation ability.

whereis the convolution operation. We omit the batch normalization (BN) and activation

layers for simplicity. The 1-bit model aims to quantize wi and ai into wi ∈{−1, +1}

and ai ∈{−1, +1} using efficient xnor and bit-count operations to replace full-precision

operations. Following [99], the forward process of the 1-bit CNN is:

ai = sign(ai1wi),

(6.66)

whererepresents the xnor and bit-count operations and sign(·) denotes the sign function,

which returns 1 if the input is greater than zero and1 otherwise. This binarization process

will bring about the binarization error, which can be seen in Figs. 6.11 (a) and (b). The

product of the 1-bit convolution (b) cannot simulate the one of real value (a) both in

angularity and in amplitude.

Substantial efforts have been made to optimize this error. [199, 228] formulate the object

as

Lw

i =wi αi wi2

2,

(6.67)

wheredenotes the channel-wise multiplication and αi is the vector consisting of channel-

wise scale factors. Figure 6.11 (c) [199, 228] learns αi by directing optimizing Lw

i to 0, and

thus the explicit solution is

αj

i =

wj

i 1

Ci1 · Kj

i · Kj

i

,

(6.68)

where j denotes the j-th channel of i-th layer. Other works [77] dynamically evaluate Eq.

6.80 rather than explicitly solving or modifying αi to other shapes [26].

Previous work mainly focuses on kernel reconstruction but neglects angular information,

as shown in Fig. 6.11 (d). One drawback of existing methods lies in its ineffectiveness when

binarizing a very small float value as shown in Fig. 6.11. On the contrary, we leverage

the strong capacity of a differentiable search to fully explore a binary space for an ideal

combination of1 and +1 without a ambiguous binarization process involved.

6.4.2

Formulation of LWS-Det

We regard the 1-bit object detector as a student network, which can be searched and learned

based on a teacher network (real-valued detector) layer by layer. Our overall framework is